NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MouseScholar: Evaluating an Image+Text Search System for Biocuration

https://doi.org/10.1109/BIBM58861.2023.10385503

Trabucco, Juan Trelles; Floricel, Carla; Arighi, Cecilia; Shatkay, Hagit; Raciti, Daniela; Ringwald, Martin; Marai, G Elisabeta (December 2023, IEEE Xplore)

Biocuration is the process of analyzing biological or biomedical articles to organize biological data into data repositories using taxonomies and ontologies. Due to the expanding number of articles and the relatively small number of biocurators, automation is desired to improve the workflow of assessing articles worth curating. As figures convey essential information, automatically integrating images may improve curation. In this work, we instantiate and evaluate a first-in-kind, hybrid image+text document search system for biocuration. The system, MouseScholar, leverages an image modality taxonomy derived in collaboration with biocurators, in addition to figure segmentation, and classifiers components as a back-end and a streamlined front-end interface to search and present document results. We formally evaluated the system with ten biocurators on a mouse genome informatics biocuration dataset and collected feedback. The results demonstrate the benefits of blending text and image information when presenting scientific articles for biocuration.
more » « less
Full Text Available
Enhancing biomedical search interfaces with images

https://doi.org/10.1093/bioadv/vbad095

Trelles Trabucco, Juan; Arighi, Cecilia; Shatkay, Hagit; Marai, G. Elisabeta; Lengauer, ed., Thomas (July 2023, Bioinformatics Advances)

Abstract MotivationFigures in biomedical papers communicate essential information with the potential to identify relevant documents in biomedical and clinical settings. However, academic search interfaces mainly search over text fields. ResultsWe describe a search system for biomedical documents that leverages image modalities and an existing index server. We integrate a problem-specific taxonomy of image modalities and image-based data into a custom search system. Our solution features a front-end interface to enhance classical document search results with image-related data, including page thumbnails, figures, captions and image-modality information. We demonstrate the system on a subset of the CORD-19 document collection. A quantitative evaluation demonstrates higher precision and recall for biomedical document retrieval. A qualitative evaluation with domain experts further highlights our solution’s benefits to biomedical search. Availability and implementationA demonstration is available at https://runachay.evl.uic.edu/scholar. Our code and image models can be accessed via github.com/uic-evl/bio-search. The dataset is continuously expanded.
more » « less
Modality-Classification of Microscopy Images Using Shallow Variants of Deep Networks

https://doi.org/10.1109/BIBM49941.2020.9313467

Trabucco, Juan Trelles; Li, Pengyuan; Arighi, Cecilia; Shatkay, Hagit; Marai, G. Elisabeta (December 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Full Text Available
An evolutionarily conserved motif is required for Plasmodesmata-located protein 5 to regulate cell-to-cell movement

https://doi.org/10.1038/s42003-020-1007-0

Wang, Xu; Robles Luna, Gabriel; Arighi, Cecilia Noemi; Lee, Jung-Youn (June 2020, Communications Biology)

Abstract Numerous cell surface receptors and receptor-like proteins (RLPs) undergo activation or deactivation via a transmembrane domain (TMD). A subset of plant RLPs distinctively localizes to the plasma membrane-lined pores called plasmodesmata. Those RLPs include theArabidopsis thalianaPlasmodesmata-located protein (PDLP) 5, which is well known for its vital function regulating plasmodesmal gating and molecular movement between cells. In this study, we report that the TMD, although not a determining factor for the plasmodesmal targeting, serves essential roles for the PDLP5 function. In addition to its role for membrane anchoring, the TMD mediates PDLP5 self-interaction and carries an evolutionarily conserved motif that is essential for PDLP5 to regulate cell-to-cell movement. Computational modeling-based analyses suggest that PDLP TMDs have high propensities to dimerize. We discuss how a specific mode(s) of TMD dimerization might serve as a common mechanism for PDLP5 and other PDLP members to regulate cell-to-cell movement.
more » « less
A roadmap for the functional annotation of protein families: a community perspective

https://doi.org/10.1093/database/baac062

de Crécy-lagard, Valérie; Amorin de Hegedus, Rocio; Arighi, Cecilia; Babor, Jill; Bateman, Alex; Blaby, Ian; Blaby-Haas, Crysten; Bridge, Alan J.; Burley, Stephen K.; Cleveland, Stacey; et al (August 2022, Database)

Abstract Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
more » « less
UniProt: the Universal Protein Knowledgebase in 2023

https://doi.org/10.1093/nar/gkac1052

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bye-A-Jee, Hema; Cukura, Austra; et al (November 2022, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
more » « less
Full Text Available
UniProt: the universal protein knowledgebase in 2021

https://doi.org/10.1093/nar/gkaa1100

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Agivetova, Rahat; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bursteinas, Borisas; et al (November 2020, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
more » « less
Full Text Available
The Gene Ontology resource: enriching a GOld mine

https://doi.org/10.1093/nar/gkaa1113

Carbon, Seth; Douglass, Eric; Good, Benjamin M; Unni, Deepak R; Harris, Nomi L; Mungall, Christopher J; Basu, Siddartha; Chisholm, Rex L; Dodson, Robert J; Hartline, Eric; et al (December 2020, Nucleic Acids Research)
null (Ed.)
Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
more » « less
Full Text Available

Search for: All records